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Abstract. Under the assumption that the true density is decreasing, it is well 
known t hat the Grenander estimator converges at rate n~^l^ if the t rue density is 



curved (Prakasa Rao 19691 and at rate n if the density is flat (Groeneboom 



and Pyke 1983 Carolan and Dykstra 1999). In the case that the true density is 



misspecified, the results of Patilea (2001 1 tell us that the global convergence rate is 
of order n~^/^ in Hellinger distance. Here, we show that the local convergence rate 
is n~^/^ at a point where the density is misspecified. This is not in contradiction 
with the results of Patilea (2001): the global convergence rate simply comes from 
locally curved well-specified regions. Furthermore, we study global convergence 
under misspecification by considering linear functionals. The rate of convergence is 
n^^/^ and we show that the limit is made up of two independent terms: a mean-zero 
Gaussian term and a second term (with non-zero mean) which is present only if the 
density has well-specified locally flat regions. 



1. Introduction 



Shape-constrained nonparametric maximum likelihood estimators provide an in- 
triguing option to kernel-based density estimators. For example, one can compare 
the standard histogram with the Grenander estimator for a decreasing density. Rules 
exist to pick the bandwidth (or bin width) for the histogram to attain optimal con- 



vergence rates, cf. Wasserman (2006). On the other hand, the Grenander estimator 



gives a piecewise constant density, or histogram, but the bin widths are now chosen 
completely automatically by the estimator. Furthermore, the bin widths selected by 



the Grenander estimator are naturally locally adaptive (Birge, 1987). Similar compar- 



isons can also be made between the log-concave nonparametric MLE and the kernel 
density estimator with, say, the Gaussian kernel. 



The Grenander estimator was first introduced in Grenander (1956) and has been 
considered extensively in the literature since then. A recent review of the history 



of the problem appears in Durot et al. (2012). The latter paper establishes that 



the Grenander estimator converges to a true strictly decreasing density at a rate of 
(n/logn)"^/^ in the norm. Other rates have also been derived over the years, 
most notably, convergence at a point at a rate of n~^^^ if the true density is locally 
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strictly decreasing (Prakasa Rao 



1969 



Groeneboom, 1985) and at a rate of n if 



the true density is locally flat (Groeneboom, 1983 Carolan and Dykstra 1999). 



As noted in Cule et al. (2010); Diimbgen et al. (2011) the "success story" of max- 



imum likelihood estimators is their robustness. Namely, let J-' denote the space of 
decreasing densities on ]R_|_. Next, let /o denote the true density and /o denote the 
density closest to /o in the KuUback-Leibler sense. That is, 

/o = argmin [ fo{x)\og ^^dx. 

Jo 9{x) 

We will call the density /o the KL projectio n density of .fn , or the KL projection for 



short. Note that if /o G J-" then /o = /q. Patilea (2001) showed that the density 
/o exists and is unique, and that the Grenander estimator converges to /o when the 
observed samples come from the true density /o, regardless if /o G J-'. Similar results 



(2010 


); 


Cule et al. 


(2010 


); 



Let fn denote the Grenander estimator of a decreasing density. We show here that 
at a point where the density is misspecified the rate of convergence of /„ to /o is 
?T,~^/^, and we also identify t he limiting distribution. This is not in contradiction with 
the results of Patilea (2001): the slower n~^^^ global convergence rate simply comes 



from locally curved well-specified regions. To be more specific, if the density /o is 
misspecified at a point, then Fq must be linear (and /o is fiat), and in regions where 
/o is fiat the rate of convergence is n"^/^. In fact, the n~^/^ rate holds at all fiat 
regions of /o, irrespective of whether these are miss- or well-specified. The complete 
result is given in Section |2| where some properties of /o are also discussed. 
Next, we consider convergence of linear functionals. Let 



Jio{g) = / g{x)fo{x)dx and Jlnig) = / g{x)fn{x)dx. (1.1) 
Jo Jo 

In Section [3] we show that n^^'^{'j2n{g) — /io(fl')) = Op{l), and we again identify the 
limiting distribution. Notably, the limit is made up of two independent terms: a 
mean-zero Gaussian term and a second term with non-zero mean. Furthermore, the 
second term is present only if the density has well-specified locally fiat regions. Our 
results apply to a wide range of KL projections with both strictly curved and flat 
regions. The work in the strictly curved case follows from the rates of con vergence of 
Fn{y) = Jq fn{y)dy to the empirical distribution function established in 
Wolfowitz] ( 1976[ ). However, as mentione d above, these are on 



regions of /q. A related work here is that of Kulikov and Lopuhaa (2008 ), who consider 



Kiefer and 
y the well-specified 



functionals in the strictly curved case but at the distribution function level. 

In Section |4] we go beyond the linear setting, and consider convergence of the en- 
tropy functional in the misspecified case. The limit in this case is Gaussian, irrespec- 
tive of at the local properties of /q. Most proofs appear in Sectionlsland some technical 
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details are left to the Appendix. Throughout, our results are illustrated by repro- 
ducible simulations. Code for these is available online at www . math . yorku . ca/ ~hkj /. 
To our best knowledge, previous work on asymptotics under misspecification in the 



shape-constrained context is limited to the rates established in van de Geer (2000) 



and Patilea (2001), as well as the more recent results of Balabdaoui et al. (2012). In 



Balabdaoui et al. (2012 ), the pointwise asymptotic distribution under misspecification 



was derived for the log-concave probability mass function. 

The implications of the new results obtained here are as follows. First, we now 
understand that /o will be made up of local well-specified and misspecified regions, 
and that the rate of convergence in the misspecified regions is always n~^/^. We 
conjecture that this type of behaviour will be seen in other situations, such as the 
log-concave setting for d = 1. That is, the rate of convergence in misspecified regions 
will be n~^/^ whereas in well-specified regions the rate of convergence will depend on 
whether locally the density lies on the boundary or the interior of the unde rlying space. 
In the log-concave d = 1 case, this "interior" rate is known to be n~^/^ (Balabdaoui 



et al. , 2009). A generally accepted conjecture is that here, the "boundary" rate will 
again be n~^/^. The interesting case of ci > 1 is more mysterious though, as the 
relationship between the slower boundary points and faster interior points is harder 
to identify. Thus, our results provide new insight on the limiting distribution even in 
the completely well-specified context. 

Secondly, we show that linear functionals (as well as the non-linear entropy func- 
tional) converge at rate n"^/^, and we also conjecture that this behaviour will continue 
to hold for other shape constraints. Let fio{g) = g{x)fo{x)dx. Our results show 
that 



:i.2) 



Therefore, global rates of divergence are n"*"^/^ for linear functionals in the misspec- 
ified case. A similar statement also holds for the entropy functional, and here the 
random Op{l) term is always Gaussian. Such results are well-understood in paramet- 
ric settings, and are key in power calculations. The exact conditions necessary for 



(1.2) to hold are given in Section |3] for jj.o{g) and in Section |4] for the entropy. Our 
work can also be easily extended to locally misspecified settings such as those studied 
in 



Le Cam (1960). 



Lastly, the fact that the limiting distribution of the linear functional 'finig) depends 
on properties of /o, whereas the limiting distribution of the entropy functional is 
always Gaussian, makes the entropy functional potentially more appealing in terms 
of testing procedures. A hypothesis test based on a modified version of the entropy 



functional was considered in Cule et al. (2010) whereas the "trace test" of Chen and 



Samworth (2011) depends on a nearly linear functional, the variance. Both, however. 



are developed in the context of log-concavity, and it would be of great interest to 
extend the results presented here to that setting, particularly for higher dimensions. 
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Figure 1. Two examples of /o and /o = gren(Fo). The two left panels 
show the cdf and density for example (2.1) while the two right panels 
show the cdf and density for example (2.2). Fq (resp. /o) is shown in 
black, and Fq (resp. /o) is shown in gray, but only if different from the 
truth (namely Fq and /o respectively). 



2. The Kullback-Leibler projection and pointwise convergence 

under misspecification 

Properties of the KL projection onto the space of log-concave densities were studied 
in Diimbgen et al. (2011). When projecting onto the space of decreasing densities. 



the behaviour is a little easier to characterize. 



Theorem 2.1. HiPatilea , 1997, 2001) Let fo be a density with support on [0,oo), and 



let Fo{x) = fo{u)du. Then fo exists and is unique. Moreover, fo is found as the 
(left) derivative of Fq, the least concave majorant of Fq. 

Thus, in our setting, we have a complete graphical representation of the distribution 
function Fo of the KL projection. This representation makes it possible to calculate 
fo in many cases. It also allows us to easily visualize the various Fo which yield the 
same fo- Moreover, the representation is key in understanding the dynamics of the 
estimator, both on a finite sample and asymptotic level. Therefore, for a function 
g we define the operator gren((yf) to denote the (left) derivative of the least concave 
majorant of g. When the least concave majorant is restricted to a set [a,b], we will 
write gren[„^b](^). 

Let So denote the support of fo- We write 5o = U W, where = {x > : 
Fo{x) > Fo{x)} and W = {x > : Fo{x) = Fo{x)}. Since fo is a density, it follows 
that Fo is continuous, as is Fq, and therefore W is a closed set and is open. For 
a fixed point xq € A^, we thus know that xq lies in some open interval. Indeed, let 
ao = sup{x < Xo : Fo(xo) = Fo{xo)} and bo = mi{x > Xq : Fo(xo) = Fo(xo)}. Then 
Xq G (ao,&o) with (ao,&o) C Al. 
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Two examples are given in Figure [TJ For the first example we have 

r 1.5 XG [0,0.5] 
~ \ x-0.25 xe (0.5,1]. ^^-^^ 

Here M = (0.5, 1) and W = [0, 0.5] U {1}. 

. . _ ri2(x- 0.5)2 [0,0.4] U [0.6,1] . . 

~ \ 0.04 xe (0.4,0.6). ^ ' 

Here = (0.25, 1) and W = [0,0.25] U {1}. The next proposition gives some addi- 
tional properties of the KL projection. 

Proposition 2.2. The KL projection density, /o, satisfies the following: 

(1) Fix Xo & M. and define Oq, &o o-s above. Then bo < oo, and /o is constant on 
(ao, bo] and satisfies the mean-value property 

1 f^o 
fo{xo) = 7 / fo{x)dx. 

(2) Suppose that fQ{x)dx < oo. Then /o = argmin^gjr J^{g{x) — fo{x)ydx. 

(3) For any increasing function h{x), h{x)fo{x)dx < h{x)fo{x)dx. 

(4) Let $ denote a non-negative convex function and suppose that go is decreasing. 

■.(«.,-.,.„. , /..«..-.«... 

(5) Let go E J-" and let Go{y) = go{x)dx. Then 

sup |Fo(x) - G'o(a;)| < sup |Fo(x) - G'o(a;)|. 

x>0 x>0 

Point (3) above tells us that if g is increasing then fioid) > V'oi.g)- Point (5) is 
Marshall's Lemma (Marshall, 1970). The proof of Proposition 2.2 appears in the 
Appendix. The next theorem is our first main result. 

Theorem 2.3. Fix a point xq € M., and let [a,b] denote the largest interval I such 
that -Fo(^) 'is linear on I. Let U denote a standard Brownian bridge process on [0, 1], 
and let Vp^^x) = I[J(Fo(x)) for x G 5. Then 

Mfnixo) - Uxo)) gren[,,,] (U^;'^) (xo), 

where 

U^od(u) - / ^^o(^) ue[a,b]nyv, 



^0 ^ ' \ -oo ue[a,b]nM 
If it happens that [a, 6] fl W = {a, b}, then 

Vn{fn{xo) - foixo)) ^ crZ, 
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where Z is a standard normal random variable and 



a 



foixo) 



foixo) 



Recall that Patilea (2001, Corollary 5.6) shows that the rate of convergence (in 
Hellinger distance) of to /o is n~^^^. The above theorem shows that the local rate 
of convergence will be ^/n where the KL projection is flat. When the KL density is 
curved, the KL density and true density are actually equal, and hence the convergence 
rate from the correctly specified case applies. The next formulation of the limiting 
process is similar to that of Carolan and Dykstra (1999) for a density with a flat 
region on [a,b]. 

Remark 2.4. Let pq = -Fo(6) — Fo{a) = Fo{b) — Fo{a). Since Fq is linear on [a, b] the 
limiting distribution may also be expressed as 

1 



grenj,,,] (U?^°'^) (xo 



Z + ypo gren(U 



mod\ 



Xq — a 

b — a 



U is an 



15'^°'^ (u) 



where Z is a mean zero normal random variable with variance po{l — po) 
independent standard Brownian bridge, and 

' V{u) M e ([a,6] n W-a)/(6-a), 
-oo u e ([a,b]f\ M - a)/{b - a). 

Notably, if [a, 6] n W = {a, 6}, then gren(U"°'^)(M) = 0. 

Figure [2] illustrates the theory. The convergence is surprisingly fast, although it 
appears to be a little slower in the second example (2.2). We postulate that this 
difference is caused by the presence/absence of the strictly curved region of /q. 



Proof of Theorem 2.3[ By the switching relation (Balabdaoui et al. , 2011), we have 
P (Vnifnixo) - /o(xo)) < t) 
= P (^argmax^>o |f„(z) - (/o(xo) + n~'^^h)z^ < 



— tz> < X 



= P (argmax,>o { (F„ (2;) - F„ (a) - (Fq (2;) - Fq (a) ) ) 
+ ^/n (^Fo{z) - Fo(a) - fo{xo){z - a 
We now look more closely at the "second" term. That is, 
Fo{z) - Fo{a) - fo{xo){z - a) 

'Fo{z) - Fo{z)] + (Poiz) - Fo(a) - fo{xo){z - a) 



noting that Fo{a) = Fo{a), since a G [a, 6]nW. On the other hand, for all z G [a, 6]nA^, 

we have -^0(2;) > Fo{z). Furthermore, Fq is concave with derivative fo{xo) (at any point 
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Figure 2. Empirical quantiles of v^(/n(^o) — /o(a^o)) vs. the true 
quantiles of the hmiting distribution at the point xq = 0.75 for /o given 



by (2.1) in the top row and (2.2) in the bottom row. The sample size 
varies from n = 10 to n = 100 000. The straight line goes through the 
origin and has slope one. Each plot is based on B = 1000 samples. 

z G (a, 6)), and hence 

Foiz)-Fo{a)-fo{xo){z-a)<0 

for all z. For z G [a, 6] fl W this is an equality, and a strict inequality otherwise. 
Therefore, the weak limit of 

V^{(F„(z) - F„(a) - (Fo(z) - Fo(a))} - (^0(2;) - Fo(a) - U^o){z - a)) 



IS 



for all z G [a,b]. For z ^ [a, 6] fl W, the limit of this process is always —00, and 
therefore the maximum must occur inside of [a,b]. Thus, 

P (v^(/n(xo) - /o(xo)) < t) ^ P(argmax,,[,,,]{U^;'^(z)-t^} <x) 

= P(gren[,,,](U^;'^)(a;o)<0, 

by switching again. When [a,b] fl W = {a,b}, then the least concave majorant is 
simply the line joining Vp^^a) and Vpf^i^b), with slope equal to 

UFoW-UFo(a) ^ 

b — a ' 
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a Gaussian random variable with mean zero and variance 
-^^{Fo{b) - Fo(a)) [1 - (Fo(6) - Fo(a))] 



(b-a) 



{b - ay 



(Fo(6) - Fo(a)) l-(Fo(6)-Fo(a)) = /o(xo) 



/o(a;o) 



□ 



Proof of Remark \2.4\ Recall that Fq is linear on [a,b]. Therefore, for x G [a,fe], we 
can write U(Fo(x)) - U(Fo(a)) = 



W + V(x), where 



W = U(Fo(6))-U(Fo(a)), 
V(x) = U(Fo(x))-U(Fo(a))- 



= U(Fo(x))-U(Fo(a)) 

u — u 

Since all variables are jointly Gaussian, a careful calculation of the covariances reveals 
that W and V(x) are independent (also as processes), and W is mean-zero Gaussian 
with variance po{l — Pq). Furthermore, 



Fo(6)-Fo(a) 



s — a 
b — a 



This decomposition is similar to that of Shorack and Wellner (1986, Exercise 2.2.11, 
page 32). Now, note that the Grenander operator satisfies greuj^ j,] ((7) (x) = (3 + 
^ gren[o,y(/i) ) if git) = a + /3t + 7/^ (B) • I* follows that 



gren^^^fe] (%J (xq) 



b — a 



b — a 



gren(I[J) 



Xq — a 
b — a 



with Z, U defined as in the Remark. The full result follows since, U^°'^(x ) 



x—a '\ 

b—a 



+ Y' 



modi 



Fq ^ ' 



X . 



□ 



3. n"^/^ CONVERGENCE OF LINEAR FUNCTIONALS 



Consider a density /o with support Sq and let /o denote its KL projection. The 
results for linear functionals hold under the following assumptions. 

(SI). The support, iSo, of /o is bounded. 
We write 5o = 5c U 5/, where Sc denotes the portion of the support where /o is curved 
and iS/ denotes the portion of the support where /o is flat. We assume also that /o is 
differentiable on Sc- 

(CI). Define 70 = sup^g^^ |/o(a;)|/ inf^e^^ fli.^)- Then 70 < 00. 
(C2). Define /3o = inf,e5. \r^[x) j fl[x)\. Then /3o > 0. 
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(Fl). On Sf, the KL projection can be written as 



fo{x) 



where gi > g2 > ■ ■ ■ > > 0, J is finite, and the intervals are disjoint and 
each is of the form Ij = [aj, bj]. 
(F2) /o(a;) > for all xeSfHM. 

Let g : So ^ M. and define ^n{g) by ( 1.1 ). Then we require that g satisfy the following 
conditions. 

(Gl). g is differentiable on Sc with \g'{x)\dx < oo and sup^^ \9{x)\ < oo. 
(G2). g e Lp{Sf) for some /3 > 2. ' 

In order to state our main result for linear functionals we need to define the following 
functions, 



g {{bj — aj)u + aj) u G [0, 1], 
{bj - ajY^ /^^ g{x)dx. 

g{x) X e Sc, 

gj X e Ij, j = 1,...,J. 

Thus, g^, . . . ,gj are the local averages of the function g, and each gj{u) is a localized 
version of g. 



9j{u) 
9{x) 



(3.1) 
(3.2) 



Theorem 3.1. Suppose that the density /o satisfies conditions (SI), (CI), (C2), 
(Fl), and (F2). Consider a function g : Sq^M. which satisfies conditions (Gl) and 
(G2). Let U, Ui, . . . , Uj denote ind ependent Brownian bridges, Ui?o(x) = V{Fq{x)), 
and define 1]^°'^ as in Theorem 



2.3 Then 



n{fin{g) -/^o(^)) 



So 



g{x)dA]F,{x) + Y^ 



gj{u) gieYi{m'"'){u)du. 



Furthermore, fg^^g{x)dlJFo{x) = fg^^g{x)dlJp^{x). Also, ifljHW = {aj,bj}, then 



gren(Uf°'^) = 0. 

It follows that ^/n{'fln{g) — /Uo(5')) will converge to a Gaussian limit for true density 



(2.2) but not for (2.1), as the latter has well-specified fiat regions. A simulation 

for (2.2) is shown in Figure |3} The proof of Theorem 3.1 is given in Section [s] 

The simulations show that there appears a systematic bias prior to convergence (the 

empirical quantiles appear on the x-axis in Fig ure [3| the negative bias translates to a 

left-shift in the plot). The proof of Proposition 5.1 shows that one source of the bias 

is the term -y/n J xd{Fn — ¥n) ~ —y/n j (F„ — F„) < . When Sq = Sc, this term is the 

only source of bias, and from Kiefer and Wolfowitz (1976), it converges to zero at a 

-g 
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Figure 3. Empirical quantiles of ■\/n{'jln{g) — /^o(5')) with g{x) = x vs. 



the true quantiles of the limiting distribution for /q given in (2.2) 



rate of at least n ^/^(logn)^/^. Since (3) of Proposition 2.2 also holds at the empirical 



level, similar behavior will be seen for all increasing functions g. 

Finally, we make a few comments on the assumptions required for Theorem 3A to 
hold. These do appear long, but have been split up to facilitate reading of the different 
assumptions which are required on the different portions of the KL density /q. The 
assumption that Sq is bounded, along with (CI) and (C2) are standard in order for 
results to hold when /o is "curved" . The particular form is taken from |Kiefer and 



Wolfowitz (1976), but some version of these has been assumed in either a global or 



local sense when proving asymptotics by Durot et al. (2012); Kulikov and Lopuhaa 
(2008), or Groeneboom et al. (1999), to name but a few. It may be possible to relax 



these, perhaps using the ideas in Kiefer and Wolfowitz (1977), but this is beyond the 



scope of the current work. The strongest part of assumption (Fl) is that J is finite, 
while (F2) is required to get the tight exponential bounds under misspecification. 
Assumptions (Gl) and (G2) are what is required of the function g in the linear 
functional, and these are different on the curved and fiat regions. Since Sq is bounded, 
numerous functions will satisfy both (Gl) and (G2). 

We also note that in the case that Sq = Sc, the ideas of van de Geer (2003) are of 
interest, van de Geer (2003) gives an overview of when one can expect asymptotic 



efficiency of plug-in estimators, such as fin{g)- Although the requirements for asymp- 
totic efficiency do depend on g, the main requirement of /o is that it have no "fiat" 
regions. The cases considered in van de Geer (2003) are of the well-specified kind. 



but we now know that this will always be the case when the KL projection density is 
curved. 



Remark 3.2. Marginal properties of the process gren(U) were studied in Carolan and 
Dykstra |^00-?| ). The results include marginal densities and moments, and in particu- 
lar E[gren {V){x)] = 4(l-2a;)/3v/27ra;(l - x) and E[(gren(U)(x))2] = 0.5(xV(l -x) + 
(1 — x^/x). Note that it follows that E[J^ {gren{l]){x))'^dx] = (1 — x^/xdx = oo, 
and hence the process gren(I[J) is not defined in L2. We would therefore not expect 
convergence of^n{.g) for g G Lpi^Sj) with (3 G [1,2]. 
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Figure 4. Empirical quantiles of y/n{T{fn) — T^fo)) vs. the true 
quantiles of the hmiting distribution for /q given in (2.1). 



4. Beyond linear functionals: a special case 

Entropy measures the amount of disorder or uncertainty in a system and is closely 
related to the Kullback-Liebler divergence. Let T(/) = f{x) log f{x)dx denote 
the entropy functional. A modified version of the entropy functional was used inlCule 



et al. (2010) in the context of testing for log-concavity. A review of testing and other 



applications of entropy appears for example in Beirlant et al. (1997). 



Theorem 4.1. Suppose that /o is bounded, the support of /o is also bounded, and 
that /o//o < Cg < cxD. Then 

v/^(r(/„)-T(/o)) ^ aZ, 

where Z is a standard normal random variable and 

a' = Var^„(log(/o(X))) = Var^^(log(/o(X))). 

The proof is made up of two key pieces: (1) tight bounds on the likelihood ratio from 



Lemma 4.2 and (2) specialized equalities which hold for the Grenander estimator. 



Lemma 4.2. Suppose that /o is bounded, the support of fo is also bounded, and that 
/o//o < eg < oo. Then 



logkdW,, 
fo 



We note that the conditions we require here are stronger than those of Patilea 



(2001, Corollary 5.6). However, under those conditions Patilea (2001) establishes 



convergence rates on J log ^=^^(iF„, which is not sufficient for our purposes. The 

^ /n+/o 

condition that /o//o is bounded above was also used in the study of misspecification 



m 



van de Geer (2000, Section 10.4). The condition that the support of /o is bounded 



is the strongest, whereas the condition that /o is bounded may be relaxed somewhat. 



We discuss this further at the end of Section 6.2.1 in the Appendix. 
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Proof. We first show that ip{fn)d{Fn — F„) = for any function Lp. This follows 

since > F„(x) with equality at finitely many touch points, and also is 

constant between all touch points. Thus, letting Xj enumerate the (random) points of 
touch, we have 

/■oo 

/ ^{U)d{h-^n) = V¥^(/„(r,))((F„-F„)(ri)-(F„-F„)(r,_i)) =0, 

with To = and = A similar argument also establishes that 



<^(/o)ci(Fo - Fo 



For Lpiy) = logf, it follows that 



v^(t(/„)-T(/o 



M)d{Fo - Fo) = 0. 



M 



Vn{ 1 \ogfndFn- 1 log % dF^ 



log 



fn 

7o 



dF„ + 



log/orf(F„-Fo) 



(4.1) 



The first term is Op{n 



^^^) by Lemma 



4.2 



with variance Varj(,(log /o(X)). By (4.1) (with (p{v 
Varj^(log/o(X)). 



The second term has a Gaussian limit 

2 , 



log f,logf) this is equal to 

□ 



A simulation of this result is shown in Figure 4 based on the true density (2.1). 



Note that this density has well-specified fiat regions, and therefore linear functionals 
that do not ignore SfdW should have non-Gaussian terms in their limit. On the other 
hand, the entropy functional will always result in a Gaussian limit. The simulations 
exhibit a systematic positive bias. The proof shown above reveals the cause: The term 
J \og{ fn/ fo)d¥n > since /„ is the MLE. In the plots the quantiles of ^/^{TQn) — 
T{fo)) are shown on the x-axis, and these quantiles appear to be shifted to the right 
- that is, they are larger than the quantiles of the limiting Gaussian distribution. 



5. Proofs for Section [3] 



We now present the proof for Theorem 3.1 We proceed by proving convergence 



results for the different types of behaviors of the density separately (curved, flat, 
misspecified), and combine the results together at the end. We believe that the 
intermediate results are of independent interest to the reader, and we also hope that 
this approach makes the proof more accessible. 
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5.1. Strictly curved well-specified density. We first suppose that the true den- 
sity /o satisfies the conditions introduced in Kiefer and Wolfowitz (1976). 

Proposition 5.1. Suppose that the support = Sc is bounded and that /q satisfies 
conditions (CI) and (C2). Suppose further that g satisfies condition (Gl). Then 



where Z is a standard normal random variable and cr^ = Va,T{g{X)) < oo. 



We note that this result is similar to that in Kulikov and Lopuhaa (2008). The 
conditions required in Kulikov and Lopuhaa (2008) are slightly different than those 
used in Kiefer and Wolfowitz (1976). 

Proof. Without loss of generality, we assume that Sq = Sc = [0,1]. Let /U„((y') = 
^"^^ (^(Xj) denote the empirical estimator of fio{g)- Using integration by parts, 
we have 



\K{g) -f^n{g)\ 



g{x)[F^{x)-¥r,{x)] 



x=l 



+ 



x=0 



g [x) 



FJx) - ¥Jx] 



dx 



< 



max \g{x) \ + 



\g'{x)\dx } sup \Fn{x) -F„(a;) 

X^Sc 



Now, by the results of 



Kiefer and Wolfowitz 



(1976), we have sup^gjQ ^] y/n\Fn{x) — 



¥n{x)\ = Op(ra-^/6log^/^n). Therefore, 

Vn{fin{g) - /io(^)) = VniJ^nig) - f^oig)) + Op{n-^^^ log^/^ n), 
from which the result follows. 



□ 



5.2. Piecewise constant well-specified density. Suppose next that Sq = Sf = 
WHSq. That is, the true density is piecewise constant decreasing and can be expressed 
as 

J 

foi^) = ^(ljkaj,b,]i^) (5-1) 
i=i 

where qi > q2 > ■ ■ ■ > qj > and Ulj = Sq where the sets Ij = {aj, bj] are disjoint. 
Indeed, we have ai = and a^+i = bj. Notice that 1 = J2j=i'dlji^j ~ '^j)- We also 
define 



qjibj 



P {aj <X < bj) , 



where X is a random variable with density /q. Also, let Ui, . . . , Uj denote independent 
standard Brownian bridge processes (each defined on [0, 1]), and let {Zi, . . . , Zj} be 
an independent multivariate normal with mean zero and covariance diag(p) — pp^ for 
P = (Pu ■ ■ ■ ,P.jV- 
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Proposition 5.2. Suppose that fo is as in (5.1). Then \/n{fn{x) — fo{x)) converges 
weakly to in La{Sf) = La{So) for any a G [1,2), where 



bi — a, 



Zj + .ypJgren(I[Jj 



X — a,- 



A pointwise version of Proposition 5.2 was originally proved in Carolan and Dykstra 



(1999). Here, we extend these results to convergence in Lq,, which is a much stronger 
statement, requiring tight bounds on the tail behavior at a point of the kind proved 
in Groeneboom et al. (1999, Theorem 2.1). In the case of the decreasing probability 



mass function, > 1 convergence has been established in Jankowski and Wellner 



(2009). An immediate corollary of this work is convergence of the linear functionals 
J2nig)- see Corollary 5.3 below. 



Groeneboom (1986, Theorem 4.1) shows that for /o equal to the uniform density 



on [0, 1] we have 



/ \fn{x) - foix)\dx 



gren(U)(x)|(ix = 2 sup lJ{x) 

0<x<l 



where U is again a standard Brownian bridge process on [0, 1]. This is an immediate 
corollary of Proposition 5.2 with J = 1. On the other hand, Groeneboom (1983) (see 
also Groeneboom and Pyke (1983)) shows that 



lo iV'Ti'ifnix) - fo{x)))'^dx - \ogn 



NiO,l) 



a/3 logn 

and hence convergence of y/n{fn{x) — fo{x)) to gren(I[J)(x) in L2{[0, 1]) fails. See also 
Remark 13.21 



Corollary 5.3. Suppose that /o takes the form (5.1) with bounded support So = 
Sf nW and with a finite J. Suppose further that g satisfies condition (G2). Then 
^(fin{g) - fJ^oig)) Yj, where 



Yj 



E {9, + 

.7=1 



Pj / gj{u)gTen{\Jj){u)du 



with gj and gj defined in (3.1). 

In the proofs which follow, unless stated otherwise, we assume that So = [0, 1]. 



Lemma 5.4. Suppose that fo is as in (5.1) with a discontinuity at a point Xo 7^ 0. 
Then, for all c > 0, 



sup 

0<x<.c/n 



fn{xo + X) - fo{xo + X) 

14 
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Proof. It was shown in Anevski and Hossjer (2002, Theorem 2) that 

fo{xo-) + foixo+) 



fnixo + t/n) - 



hit), 



(5.2) 



where h{t) is the left derivative of the least concave majorant (over M) of the process 
N(A(.))-A(.)-| ^°^"°-^;^°^"°+^ 

where the rate function is equal to 

^ / /o(a;o+)s, s > 0, 
^ ' 1 fo{xo-)s, s < 0. 



Here, N denotes a standard two-sided Poisson process. The result in Anevski and 



Hossjer (2002, Theorem 2) is established by a "switching" argument similar to that 



in the proof of Theorem |2.3[ The switching argument can also be extended to this 
situation even if /o(a;o— ) = fo{xo+)- A similar argument may also be used to show 
convergence in finite dimensional distributions as well. We next show convergence of 
the supremum norm 



sup 

0<x<c/n 



fn{Xo + x) - fo{Xo + x] 



sup 

0<a:<c 



h{x) + 



fo{xo-) - fo{xo+] 



0,(1). 



This is done by (1) showing that the convergence in (5.2) also holds in D[0, oo), 
and (2) showing that this implies convergence of the supremum norm (as above). 
Both of these steps follow exactly the same argument as the proof of Theorem 1.1 in 
Balabdaoui et al. (2011), and we therefore omit the details. □ 



Lemma 5.5. Suppose that /o is decreasing on M. and flat on {a,b] and fix x G {a,b). 
Then, for any to > and ko > 0, there exists a constant cq = to/^fo^b) + to/^o) such 
that 



(jn{x)> fo{x)+n-'/H 



< 



exp 



t(x — Xq) . 

-Co } for all t > to, 



for all n > {ko/3y. Also, 

p(fn{x)<fo{x)-n-'/H] < exp 



t\b-x) 



2/o(x) 



and otherwise the probability is equal to zero. 
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Proof. Let F„(a, s) = F„(s) - F„(a), and we write /o(a+) = lim^-^a+ foix). By the 
switching relation, 



P[fn{x)> fo{x)+n-'/H^ 

= P (argmaXgg[o y{F„(s) - (/o(x) + n~^/'^t)s} > x) 

= P (argmax^gjo y{F„(a, s) - (/o(a+) + - a)} > x) 

< P (F„(a, s) > (/o(a+) + n-^/H){s - a)} for some s G {x, 1]) 

^/ F„(a,g) ^ (/o(Q+)+n-V^t)(s-a) ^ 
= P — ^ > ; ^ tor some s G x, 1 

„ /F„(a,s) ^ , , 

= -P ^ > 1 + "TT ^ some s G (x, 1 

VFo(a,s) /o(a+) 

/ F^(a,g) ^ ^ , n-'/H 

< P sup -— r > 1 + —, r 

\se(x,i] Foia, s) fo{a+) 



Since F„(a, s) is a binomial random variable, we can bound the above using Shorack 



and Wellner (1986, Inequality 10.3.2, page 416), with h{v) = f(logi; — 1) + 1 and 



V'(v) = 2h{l + v)/v'^ > (1 + v/2,)-^. It therefore follows that 



P sup ' , > 1 + — r < exp <^ -nFQ{a,x)h 1 ' 



^36(^,1] Fo(a,s) /o(a+) / I ' V /o(a+ 

f t^ix -a) ( n-^lH 
exp < — -^/^ 



< exp 



2/o(a+) V/o(a+) 
t(2; — a) ^//o('^+ 



2 1 + (t//o(a+))/(3y^) j ■ 
Write u = t//o(a+) and note that for all n > {ko/S)"^ we have 



u ^ u 



1 + m/(3v^) ~ 1 + u/fco' 

which is a increasing function of u. Fix to > and let uq = to/ fo{a+). Then, with 
Co = uo/{l + uo/ko) = to/{fo{a+) + to/^o) we have that 

P {fn{x) > fo{x) + n-'/H) < expj-cot^j. 
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We handle the other side in a similar manner. 



P (?n{x) < fo{x) - n-'/H^ 

= P (argmax,g[o_i]{F„(s, b) - (/o(6) - n-'/H){x - b)} < x) 

< P I „ . \' < 1 - -r^ for some s e [0, x) 



Fois,b) 



< Pi mf ^<,_-^/^^ 



<e[o,x) Fo{s,b) 



We now bound this using the martingale inequality from Groeneboom et al. (1999 
Lemma 2.3). 



P inf 



s€[0,x) Fo{s, b) 



< 1 



< exp <^ —nF{x, b)h [ I 
t^{b - x) 



n 



-1/2^ 



exp 



2/o(&) 



h{b) 



Now, note that since is a density, we only consider t < v^/o(^) = V^foi^)- 
Therefore, we bound only '?/'(— f) for v G [0, 1], for which we have that ip{—v) > 1. 
Thus it follows that 



Pifnix) < fo{x)-n-'/h] < exp 



2/o(&) 



□ 



Let (a;)+ = max(x,0) and (a;)_ = min(x,0). 

Lemma 5.6. Suppose that fo is flat on (a, b] and fix x E (a, b), and fix a > 0. Then, 
there exists a constant C such that 



E 
E 



n{fn{x) - fo{x))- \ < C{b-x)-"/', 



with the second bound valid only for {x — a) > co/n, for some cq > 0. 
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Proof. Using the bounds obtained in Lemma 5.5, we find that 



E 



n{fn{x) - fo{x)). 



ar-'PiV^ifnix) - /o(x))_ > t)dt 

nl/2/o(x) 

ar-'Pifnix) < foix) - n-'/H)dt 
t^{h - x) 



< I at"" ^ exp 



2/o(&) 

a/2 



dt 



For the second inequahty, we first fix to > 0. We then have 



E 



\/n(fn{x) - fo{x))^ 



to 



ae-'P{^Un{x) - /o(x))_ > t)dt 
ar^'PiUx) > h{x)+n-^/H)dt 

POO 

+ / ar-'PiUx)> foix)+n-'/H)dt 



to 



< t^ + ^°°«t--^exp|-co^^^^ 

< t° + r(a + l) 



dt 



Co(x — a) 

Now, recall that cq takes the form to/(/o(^) +^o/^o)- Therefore, we obtain the bounds 



r(a + l) 



co(x — a) 



< 2"r(a + l)(x-a)" 



\x — a J 



a ffoib)+to/ko 



as long as to/^o ^ -^/ol^^o) for some choice of K. We optimize the entire quantity in 
to to find that 

a/2 



E 



Vn{fn{x) - /o(a;))_ 



< A. 



\x — a J 



for some new constant A^- Now, in order for this optimized bound to hold, we need 
^0 < Kfo{b)ko, and 

\x — a J 

The latter translates to {x — a) > Con"^. □ 
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Proof of Proposition 5.2. The outline of the proof is as follows. We first require 



pointwise convergence, which follows from Carolan and Dykstra (1999, Theorem 
6.4). One can also easily extend this to convergence in finite dimensional distri- 
butions. The particular form of the limit follows from the following decomposition 



of a (time-transformed) Brownian bridge, which is a generalization of Shorack and 
Wellner ( |1986 , Exercise 2.2.11, page 32). Let F denote any distribution function 
with compact support, which, without loss of generality, we assume to be [0,1]. Let 
ai = < bi = a2 < . . . < bj = 1. Let V, Ui, . . . , Uj denote independent Brownian 
bridges. Then 



V(F(t)) . |:{|:AV(F(a.)) + AV(F(a.))||^ 



(5.3) 



+ ^/F{h)-F{ai)V, 



( Fit) 



F(a, 



\F{k)-F{ai 



l(a,;A:](^) 



where AY{F{aj)) = Y{F{bj)) - Y{F{aj)). 

Recall that the Grenander operator satisfies gren(a + bt + ch{t)) = b + cgren(/i(t)). 
Also not e that Fq is linear on {ai,bi] by assumption. Therefore, from [Carolan and! 
Dykstra (1999), the limit of y/n{fn{x) — fo{x)) at a point x & li 
written as 



Oj, bi] can be 



gren(,^,,^](V(Fo(t))) = AV(F(a,)): 



1 



t 



i 



AV(F(a,)) + v/^gren (U^ 



bi — a, 
t — a. 



bi - ai \ 

from the above characterization. Finally note that {AV(F(ai)),, 



a,; 



AV(F(aj))} 



{Zi, . . . , Zj} as in Proposition 5.2 



The second step is to show that the process 

§„(x) = \/n(fn{x) 



foix)) 

is tight in L„(iS). For this, we first need a characterization of compact sets in La{S) 
for a > 1. These appear, for example in Dunford and Schwartz (1958, page 298) (see 
also Simon (1987)). For S bounded, a set i^' C La{S) is relatively compact if for all 



feJC 

(1) supj^kJ^ |/(x)|"c/x < oo, 

(2) lim^^osupjg;^/^ \f{x + e) - f{x)\'^dx 0. 

We then need to show that for all e > 0, there exists a compact set K C La{S) 
such that sup„P(S„(-) G K^) < e. We will do this by showing that E[J \Sn{x)\"'dx] 
is uniformly bounded in n (for sufficiently large values of n). This clearly estab- 
lishes condition (1) above. It is also sufficient for condition (2). This is because 
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/„ is increasing and /o is piecewise linear, and therefore, except for a few small re- 
gions, (/„ — fo){x) is also an increasing function. To bound E[J \Ein{x)\°'dx] we use 

and consider separately |(S„(x))+|"(ia;] and 



the results of Lemmas 



5.4 



and 



5.6 



E[J \ {Ein{x))-\°'dx]. To simplify the exposition, we write down the details only on a 
region (a, b] where /o is fiat. The full result is, of course, immediate. 
Thus, the two bounds follows: 

E v^(/„(x) - /o(x))_ " dx < C j {b-xy^^dx, 

J a 

which is finite for a < 2. Secondly, 

-b 



E 



nifnix) - /o(a;))+ 

a+CQ I n 



< 



E 



dx + C 



dx 

^ifnix) - fo{x)) 

< n"/2~^0p(l) + C {x- ay'^dx, 

J a 

which is again finite for a <2. This completes the proof for a G [1, 2). 



(x — a) "^'^dx 



a+C(,/n 



□ 



Proof of Corollary 5.3. Convergence follows immediately by continuity of the linear 
functional / g{x)E>n{x)dx by Holder's inequality. We need only check the final form: 



g{x)'8i{x)dx 



E 

i=l 
J 

E 



bi- ai 
g{x)dx 



9[x) 
bi — a,- 



gren(I[Jj 



bi - a,; 



dx 



+ \/Pi j 9{{h - ai)u + ai) gren(Uj) (u) du 



□ 



5.3. Piecewise constant KL density. We next consider the case that fo{x) can 
be written in the form given in condition (Fl). Let Ui, . . . ,Uj denote independent 
standard Brownian bri dge processes (each defined on [0,1]), and for each j define 
^mod Remark 2.4 with Ij = [aj, bj] replacing [a, b]. Also, let {Zi, . . . , Zj} be an 
independent multivariate normal with mean zero and covariance diag(p) — pp^ for 
p = {pi, . . . ,PjY, where pj = Fo{bj) - Fo{aj) = Fo{bj) - Fo{aj). 

Proposition 5.7. Suppose that /o satisfies conditions (Fl) and (F2) with Sq = Sf 

and that g satisfies condition (G2). Then \/n{fn{x) — fo{x)) converges weakly to 
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m La{S) for a G (0,2), where 



X 



J 1 
X] h. - 



mod^ 



bj - aj 



l(a,,6,](a;). 



Corollary 5.8. Suppose that fo satisfies conditions (Fl) and (F2) with So = Sj and 
that g satisfies condition (G2). Then \/n{pn{g) — fio{g)) =^ Yj, where 



Y 



mod 



:ren(Uf°'^)(u)du 



with gj{u) and g^ defined as in (3.1). 



The proof of these results is very close to that of Proposition |5.2[ and we omit 
any details which are the same. The difference lies in the following modifications 
to Lemmas 5.4| and |5.5 Note that we add the additional requirement that /o be 
bounded below (F2). 

Lemma 5.9. Fix a point x E So and let [a, b] denote the largest interval I such that 
X & I and fo is constant on I. Then, for all c > 0, 



sup 

0<u<c/n 



fn{a + u) - fo{a + u) =Op{l). 



Proof. By the switching relation, it follows that 
P {jn{a + u/n) - fo{a + u/n) < 

= p(f^{a + uln)- fo{b)<t^ 

= P (^argmax^gp y |f„(z) - (/o(6) + t)zj <a + u/n'^ 
= P {n{ argmax^g[o_i] |f„(z) - (^ (6) + t)z^ -a)<u), 

and the inner process 

n (^argmax^gp 1] |f„(2;) - (/o(6) + t)z| - 

= argmax;,>_„„ |F„(a + h/n) - (t + foib)){a + h/n)^ = argmax;,>_„„ {^n{h)} , 

where V„(/i) = A^ih) + Bn{h) - th, with A„(/i) = ra(F„(a + h/n) - F„(a)) - n{Fo{a + 
h/n) - Fo{a)) and Bn{h) = n{Fo{a + h/n) - Fo{a)) - fo{b)h. 

Now, the term Nn{h) = n(F„(a + h/n) — F„(a)) is binomial with mean n{Fo{a + 
h/n) — Fo{a)) — )■ fo{a)h. Therefore, v4„(/i) converges to a centred Poisson random 
variable with mean /o(a). A similar argument may be used to show convergence 
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as a process of An{h) =^ N{h) — fQ{a)h, where N(-) is a Poisson process with rate 
X{h) = /o(a). The second piece, Bn{h) satisfies 

^-iR (h\ / = ° hen{[a,h]nW-a] 
n ^n[H) 1^ Q hen{[a,b]nM-a} ' 

Thus, if for all 5 > [a, 6)nW ^ {a} then the limit of Bn{h) is if /i = and is 
equal to — oo otherwise (we will call this setting case (A)). If the above assumption is 
not true (we will call this setting case (B)), then lim„^oo Bn{h) = for all h > 0. In 
case (A), it follows that the limit of Yn{h) is equal to at /i = and is equal to — oo 
otherwise. Therefore, argmax^>o{Vn (/*•)} = here. In case (B) the limit of Vn(/i) is 
a centred (a.k.a. compensated} Poisson process with rate fo{a). We therefore have 
that, in case (A), 

P (Jri{a + u/n) - fo{a + u/n) < ^ P (argmax^>_„„ {V„{h)} <u) ^ 1, 
and in case (B), 

P (jn{a + u/n) - fo{a + u/n) < = P (argmax^>_„„ {V„(/i)} < u) 

P (argmax;j>Q {N{h) — fo{a)h — th} < u) , 

which gives us pointwise convergence in distribution in both cases. 

Lastly, note that /o(a + u) = fo{b) is a constant, and /„(a + u) is decreasing in u 
by definition. Therefore, supo<„<c/n |/n(a + u) - fo{a + u)\ = |/„(a) - fo{b)\, which 
converges as described above. □ 

Lemma 5.10. Suppose that fo is flat on {a,b] and fix x ^ {cL^b). Assum,e also that 
mixe{a,b] fo{x) = ao > 0, and let cq = ao/ fo{b). Then, for any to > and ko > 0, 
there exists a constant cq = to/{fo{b) + to/ko) such that 

p(fn{x)>fo{x)+n-^lH) < expj-coco^^^^^l for all t> to, 

for all n > (A;o/3)^. Also, for all t e [0, ^/nfo{x)], 

p(Ux)<Ux)-n-'/h) < exp|-co*^4f^ 
^ ^ [ 2/o(x) 



and otherwise the probability is equal to zero. 
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Proof. Let F„(a, s) = F„(s) — F„(a), and we write 9 = fo{x). Repeating the argument 
for the proof of Lemma 5.5 we obtain that 

'F„(a,s) ^ {9 + n-^/H){s~a) 



< P 



Fo{a, s) 



> 



Fo(a, s) 



for some s G (x, 1] 



< P sup 



P sup 





[a, 


s) 


F„ 


[a, 


s) 


Fo 


[a, 


s) 


F„ 


[a, 


s) 


Fo 


[a, 


s) 



se{x,i] Fo{a,s) 



(6 + n-^'H){s - a) 



> inf 

^6(x,i] Fo(a, s) 

> 1 + 



since Fq{s) > Fq{s) with equahty at s = a,b. Applying the exponential bounds for 
binomial variables as before, we find that 



F„(a,s) 

P I sup — — r- > 1 

se(x,i] -ro(a, s) 



n 



-1/2^ 



< exp <^ —nFQ(a, x)h | 1 
/o(a; 



n 



-1/2^ 



< exp < — 



inf 

x&[a,b] Q 



9 

t{x — a) 



t/9 



it/9)/i3V^ 



fo{x) 



Co > 0, we can repeat the same argument 



Therefore, assuming that inf ^.g 
as for Lemma (5.5). 

We handle the other side in a similar manner. 

p{fnix)<foix)-n-'/H) 

. p/^.fF^M^ {9-n''/H){s -b) 

< P \ ml — — < sup — — — 

yse[o,x) Fo{s,b) se[o,x) Fo{s,b) 

< PI inf F4Mj)<i_!C* 

se[o,x) Fo(s,6) 



We again bound this using the martingale inequality from Groeneboom et al. (1999 
Lemma 2.3). 



PI mf ^<1 

se[o,x) Fo{s, b) 



n 



-1/2^ 



< exp < —nF{x, b)h I 1 



n 



^1/2^ 



9 



exp 
23 



inf ^ 

x&[a,b] Q 



t^{b - x) 
29 



n 



-1/2^^ 
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□ 

5.4. Putting it all together. 



Proof of Theorem 3.1. To illustrate the method of proof, we consider a simplified 
case. Since J is finite in the form of /o, the proof easily extends to a general setting. 
Suppose then that Sc = [0,a] and Sf = [a,b], so that the support is So = [0,6]. 
Furthermore, we assume that on Sf we have J = 1. Let U„(x) = y/n(¥n{x) — Fo{x)). 
Then 

n{fin{9) - Jioig)) 



g{x)(R]n{x) + j g{x)y/n (gren{¥n){x) - Mx)^ dx + 8^, 
where = \fn g{x)d{Fn — ¥n){x). From assumptions (CI) and (Gl) it follows that 



= Op{l) as in Proposition 5.1 

Next, letW„ = U„(6)-U„(^and let V„(a;) = U„(x)-U„(a)-f5fW„ for a; G [a,b]. 
Lastly, let i{x) = -Fo(a) + /o(6)(a; — a). Then for x G (a, b], 

^fn ^gren(F„)(x) - %{x)^ = y/n ^gren(F„)(x) - /o(6)) 

= gren(U,, + v^(Fo - £)) 

= ^W„ + gren(V„ + v^(Fo-£)), 
b — a 

and we also define V^""^ = V„ + ^/n{FQ — €). We therefore have, 

g{x)dlJn{x)+Wn-^ [ g{x)+ [ g{x) gTen{Yr'){x)dx + Op{l) 



b — a ^ ^ 

I pb 

gix)dUnix)+ / gix)gTen{Yr'')ix)dx + Opil), 

J a 

from the definition of W„ and of g. The weak limit of V™°'^ can be established similarly 
as in Theorem 2^ and Remark 2A. The outline of the rest of the proof proceeds as 
follows: 

(1) Joint weak convergence of {Jq gd\]n,Y^°'^{xi), . . . ,Y^°'^{xk)} to a Gaussian 
limit. 

(2) Joint weak convergence of {J^ ^(iI[J„, gren(V^°'^)(xi), . . . , gren(V^°'^)(xfc)} via 
the switching relation. 



(3) We have that 

gren(Vr')(a;) = [Lix] - fo{x) 



24 
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where in Proposition 5.7 we showed that the first term on the right hand side 
is tight in La{a, b). The second term on the right hand side is a tight constant, 
and therefore gren(V™°'^)(a;) is also tight in La{a,b). 



(4) From (1) and (3) we obtain marginal tightness of the terms J^^ gdUn in M and 
gren(V^°'^)(-) in Lc,{a,b), which imphes joint tightness in M x La- The full 
result now follows by the continuous mapping theorem. 

Lastly, we note that since Fo(z) = Fq(z) at z = a,b and g is constant on [a, b] then 
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Appendix 

6.1. Convergence of decreasing densities. All statements in this subsection are 



taken from [Patilea ( 1997 2001 ) and are included for completeness. Related properties 
for the class of log-concave densities were studied in Cule and Samworth (2010); 



Diimbgen et al. (|2011|). Recall that any decreasing density on 

oo 



can be written as 



a mixture of uniform densities 



y 



Mo,y]{x)dlJo{y), 



where the mixing probability measure /iq is unique. Consider a sequence of decreas- 
ing densities /„, each with a mixing probability measure Also define a distance 
between two densities in as 



inf {/(x + e) < g{x) + e Vx > 0; g{x) - e < f{x - e)'ix > 0} 



£>0 



and let h^{f\g) = 1/2 J {y/J — ^^dx denote the square of the HeUinger distance. 
Proposition 6.1 ( Patilea] (1997, Lemma 2.3.1)). The following are equivalent 

(1) /i2(/„,/o)^0, 

(2) p(/„,/o)^0, 

(3) IXn ^ f^O- 
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6.2. Additional Proofs. 



Proof of Proposition \2.2 . (1). Let ^ denote a general linear function. Then Fq{xq) = 
min{£(xo) : i > Fq}. Now consider, i{x) the (specific) linear function which passes 
through the points {qq, Fo^ao)), (6o, -Po(&o))- Then clearly, there is no linear function 
which always lies above Fq and strictly below i on (oq, bo). It follows that Fo{x) = £{x) 
on {ao,bo). 

Next, suppose that Fo(ao) < 1, with Fo(ao) = -fo(ao) but with Fo{x) > Fq{x) for 
all X > a. Since Fq is increasing, it follows that any straight line £{x) such that i{a) = 
Fo(ao) and i{x) > -fo(x) has the form i{x) = -Fo(c^o) + ^^(2; — clq) with m > 6 > 0. 
But then Fq{x) > Fq{x) + 6{x — a^), which implies that limj._j.oo Fq{x) = 00 > 1, a. 
contradiction to the fact that /o is a density. 

The last statement follows immediately since Fo{bo) — Fo(ao) = -fo(^o) — -fo('^o) and 
Fq is linear on (ao,&o]- 

(2). Define f{g) = J^{g{x) — fo{x)Ydx. We first show that the function g minimizes 
f{g) over V, the space of decreasing nonnegative functions on IR+, if and only if 
G{y) = Jq g{x)dx > Fo{y) for all y > 0, with equality at all points y such that in the 
decomposition g{x) = l[o^y]{x)dfig{y), we have fJ,g{y) > 0. 

The proof of this is a standard argument. We first calculate the directional deriva- 
tive of ip{g), which is Vbf{g) = b{x){g — fo){x)dx, and recall that any nonnegative 
decreasing function may be written as g{x) = l[o^y]{x)dfi{y) for some nonnegative 
measure /z. Then, if ip{g) > ipijj) for all g, it follows that Vb^ijj) — for all b 
such that 'g + eb G T> for sufficiently small e > 0, and this holds for all b = l[o,y]- 
Also note that if fig{y) > 0, then 'g ± £l[o,y] £ 1^ for sufficiently small e > 0. Since 
Vi[o,j,]V5(?) = G{y) - Fo{y), the result follows. 

For the reverse direction, we need to show that if g is such that G{y) = g{x)dx > 
Fo{y) for all y > 0, with equality at all points y such that in the decomposition 
g{x) = l[o^y]{x)dfig{y), we have ^g{y) > 0, then it must have a smaller ip value. 
Therefore, fix ^ that satisfies this property, and notice that 

/•oo /"OO 

<^(^)-<^(?) = / {9-9Ydx + 2 {g -g){g- fo)dx 
Jo Jo 

/»00 /"OO 

> 2 / / l[o^y]{x)(;g- fo){x)dfig{y)dx 



Jo 



/•oo POO 

-2 / / l[o,y]{x){g - fo){x)d^g{y)dx 
Jo Jo 

/•oo /"OO 

= 2/ iG-Fo)iy)dfigiy)-2 (G - Fo){y)d^^{y) > 0. 
Jo Jo 

The first statement now follows. It remains to show two things: first, that G is the 

least concave majorant of Fq and that limj._^oo G{x) = 1, as from the latter it follows 
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that J Tjdx = 1 and hence minimizing over V is the same as minimizing over J-". Notice 
also that the second statement follows from the first, since Fq is increasing to one. It 
remains to show that G is the least concave majorant of Fq. By way of contradiction, 
assume that the concave G > Fq is not the least concave majorant. Then there exists 
a concave function G such that G > G > Fo, and there exists some interval where 
G{x) > G{x) > Fq{x). But then, by the arguments above, G cannot be a straight line 
on this interval, and hence there exists an x such that fig{x) > and G{x) > Fq{x). 
This completes the proof. Note that this could also have been proved using the results 



of Moreau (1962); see also Hiriart-Urruty and Lemarechal (2001, pages 46-51). 



(3). This follows from the argument in (2), since h is increasing and therefore 
V(_;,)y^(/o) > 0. 



(4). This follows as in Robertson et al. (1988, Theorem 1.6.1). Since $ is convex we 



have that $(f) > $(m) + ^'{u){v — u), for any (sub-) derivative function It follows 
that 



<^ifo-go)dx > / <^{fo-go)dx+ / <^'ifo - go)ifo - fo)dx. 



The second term is equal to 

'^'{fo- go){fo-fo)dx 



M 



E 



^'ifo- go)ifo- fo)dx, 



{aj,bj] 



where the intervals {aj,bj] denote the sets on which /q is constant. But here, go is 
decreasing, and therefore on a fixed (a^, bj], $'(/o — go) is increasing. By (3) above it 
therefore follows that the second term is nonnegative, proving the result. 

(5). Let a = sup^>o \Fo{x) — Go{x)\. Then, by definition, Fo{x) + a > Go{x). Also, 
since Go{x) + a is concave and always greater than Fo{x), we have that Go{x) + a > 
Fo{x), and the result follows. 

□ 



6.2.1. Proof of Lemma \4-2 . We note that within this section we make reference to 
the "bracketing entropy", in the sense of empirical process theory (van der Vaart and 
Wellner, 1996), which is different than the entropy functional discussed in Section |4| 



Define the Bernstein "norm" to be d%{f,g) = 2 /(el-^^^'l - 1 - |/ - g\)dFo, and let 
h'^if^g) = 1/2 / (vT — ^/gYdx denote the square of the Hellinger distance. 

Lemma 6.2. Suppose that fo/ fo < Cg. For mj = log((/ + /o)/2/o), we have 

(1) dl{mj,m^^^)<2Aclh\fJo), 

(2) dUmf,mg)<A8clh\f,g). 
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Proof. For x > —2, we have e'^'' — 1 
follows 



\x\ 



< 12(e^/2 _ 1^2^ gij^pg > -log 2, it 



4(m^,m^J < 24 j{e^f'^-lfdFo = 2A j 



7 + /o 



2/o 



1 dFo 



< 48c^/i^((/ + /o)/2,/o) < 2Acih\fJ, 



applying van de Geer (2000, Lemma 4.2, page 48) in the last inequality. Now, suppose 
that mf—mg > 0. Then using a similar argument, we have that d\{mf, nig) is bounded 
above by 



24 



7 + /o 
g + h 



1 rfFn 



48 



/ + /o . 9 + h\ fo 



9 + fo 



-dx 



If — mfig < 0, we use instead the bound e'"^'' — 1 — |a;| < 12(e'^'/^ — 1)^, and obtain 
dl{mf,mg) < 48clh'{gJ) = A8clh\f,g). 

□ 

Proof of Lemma \4^ Let hl{f, /o) = | / ( ~ '^-^o denote the modified Hellinger 



distance used in 
log X < X — 1, 



Patilea 



4 



fn 



< - log ^d¥„, < 



fo 

(2001). Since /„ is the MLE, h{x) = log(x) is convex, and 
1 



fo 



2/o 



1 



fn + fo 

2/o 



dFn 



where m/ = log y^'^j^J ■ Now, from 



Patilea 



(2001 



Lemma 2.2, (2.2)), we have 



dFo>0. Hence 



< - / log^ciF„ < / mfJ{¥n-Fo 



fo 



and therefore to prove the lemma it is sufficient to prove that the term on the right 
hand side is also of order Op(n~^/^). To this end, define Gn(m/) = \/n J m/(x)c?(F„ — 

Fo), and = sup^.^^ |G„(/i)|, and let M5,k = {"^/, ^o(/, /o) < f E J'k}, 
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with J^K denoting the class of positive decreasing functions with bounded support 
and bounded above by K. 
Now, by Markov's inequahty 



E 



[771', 



< 



f^J > C mj^^ G M5^,k) + P(/io(/n, /o) > 5n) + P{fn ^ 



n 



1/2^2 



+ P(/^0(/n,/0)>5n)+P(/n^-^i^). 



(6.1) 



theory as in Gao and Wellner (2009). 



To finish the proof we will use the results of Patilea (|2001|) and empirical process 

we will use 



To obtain the appropriate bracketing entropy bounds on E 



van der Vaart and Wellner (1996, Lemma 3.4.3, page 324) (we can do this by the first 



inequality in Lemma |6.2|from whic h it follows that if /io(/n) /o) < ^ then ^^(/n, /o) < 



55). Next, the results of Lemma |6.2 further show that an bracket [v7; a/^] of 
densities of size 5 leads to a bracket [m/,mg] of Bernstein norm size a multiple of 5. 
Therefore, the bracketing integral of M.h,K under the Bernstein norm can be bounded 
by the bracketing integral of 



-A^(S,K = {/)/ decreasing with /(0+) < K and support on [0,^4] and 
under the L2 norm 



<5} 



= / p{x)dx. By Gao and Wellner] ( |2009[ Theorem 4), 

we have that the bracketing integral J[.](5, A^^, || -112) < C{\ogKY^^S^^'^, for some 
constant C and we assume without loss of generality that A = 1. Applying |van der 



Vaart and Wellner (1996 Lemma 3.4.3, page 324) it follows that 



E[||G„| 



\Ms„,k\ 



< C{\ogKY/X/'il + {\ogK) 



1/4 



On 



52 



We now choose (5„ = Mn for M > 1 and plug this into (6.1). We obtain 

P(n~^/2GJm?) > Mn-2/3) 



Jn 

{logKy/^ 

Now, by definition 



< C 



1 + 



(logiT) 



1/4 



n 



1/6 



+ P(p(/„,/o) >Mn-i/3) 



(6.2) 
P(/n(0+) > K). 



/o(0+) 



sup 

t>0 



F„(t) 



t 



F„(t) Fo{t) 
sup p . 



Fo{t) F„(t) 

< sup sup . 

t>o t t>o Foyt)' 



and the term sup^^o Fn(i)/Po(^) = Op{l) (Shorack and Wellner, 1986, Theorem 2, 
page 345), while sup^^Q = /o(0+) which is bounded by assumption. Therefore, 

n^/^. Using 



P{fn{0+) >K)< C/K, and we can choose K 
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5.6 with e = 1 and a = 0) to handle the middle term, we obtain 
lim limsupP(n~^/2G„(m^ ) > Mn'^/^) = 0, 

as required. □ 



We finish off this section by commenting on how the conditions of Lemma 4.2 could 
be relaxed. The conditions that (A) /o//o < oo and (B) /o has bounded support are 
necessary in our method of proof. However, one could relax the condition that /q is 
bounded above and replace it with both 

(C) for some e G (0, 1), fj^ f^dFo^x) < oo, and 

(D) limsup„P(sup(>o^n(^)/^ > ^n) = 0, for -> oo such that logi^'n = o{n'^^^). 



The key inequality in (6.3) provides a bound made of up of three terms. The second 



of these is handled by Patilea (2001, Corollary 5.6), and this continues to hold if (A), 
(B), and (C) are true. We then have to pick K = Kn such that both the first and 
third terms go to zero, which is exactly condition (D). Condition (C) is discussed in 



Patilea] (|2001 ). Some ideas on how to achieve (D) can be gained from the results in 



Balabdaoui et al.| ( |2011[ ). 
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